A corpus of European Portuguese child and child-directed speech
نویسندگان
چکیده
We present a corpus of child and child-directed speech of European Portuguese. This corpus results from the expansion of an already existing database (Santos, 2006). It includes around 52 hours of child-adult interaction and now contains 27,595 child utterances and 70,736 adult utterances. The corpus was transcribed according to the CHILDES system (Child Language Data Exchange System) and using the CLAN software (MacWhinney, 2000). The corpus itself represents a valuable resource for the study of lexical, syntax and discourse acquisition. In this paper, we also show how we used an existing part-of-speech tagger trained on written material (Généreux, Hendrickx & Mendes, 2012) to automatically lemmatize and tag child and child-directed speech and generate a line with part-of-speech information compatible with the CLAN interface. We show that a POS-tagger trained on the analysis of written language can be exploited for the treatment of spoken material with minimal effort, with only a small number of written rules assisting the statistical model.
منابع مشابه
CEPLEXicon ― A Lexicon of Child European Portuguese
CEPLEXicon (version 1.1) is a child lexicon resulting from the automatic tagging of two child corpora: the corpus Santos (Santos, 2006; Santos et al. 2014) and the corpus Child – Adult Interaction (Freitas et al. 2012), which integrates information from the corpus Freitas (Freitas, 1997). This lexicon includes spontaneous speech produced by seven children (1;02.00 to 3;11.12) during approximate...
متن کاملGrammar and frequency effects in the acquisition of prosodic words in European Portuguese.
This paper investigates the acquisition of prosodic words in European Portuguese (EP) through analysis of grammatical and statistical properties of the target language and child speech. The analysis of grammatical properties shows that there are solid cues to the prosodic word (PW) in EP, and the presence of early word-based phonology in child speech shows that EP children are aware of these cu...
متن کاملA Longitudinal Study of Prosodic Exaggeration in Child - directed Speech 194
We investigate the role of prosody in child-directed speech of three English speaking adults using data collected for the Human Speechome Project, an ecologically valid, longitudinal corpus collected from the home of a family with a young child. We looked at differences in prosody between child-directed and adult-directed speech. We also looked at the change in prosody of child-directed speech ...
متن کاملA longitudinal study of prosodic exaggeration in child- directed speech
We investigate the role of prosody in child-directed speech of three English speaking adults using data collected for the Human Speechome Project, an ecologically valid, longitudinal corpus collected from the home of a family with a young child. We looked at differences in prosody between child-directed and adult-directed speech. We also looked at the change in prosody of child-directed speech ...
متن کاملProsodic Features from Large Corpora of Child-Directed Speech as Predictors of the Age of Acquisition of Words
The impressive ability of children to acquire language is a widely studied phenomenon, and the factors influencing the pace and patterns of word learning remains a subject of active research. Although many models predicting the age of acquisition of words have been proposed, little emphasis has been directed to the raw input children achieve. In this work we present a comparatively large-scale ...
متن کامل